unsupervised motion representation learning
Appendix for Unsupervised Motion Representation Learning with Capsule Autoencoders
We show in the table below the notations grouped by the modules. The values used in our implementation are shown if applicable. The necessity of a two-layer hierarchy is briefly discussed in Section 3.3. In short, it is difficult for a single-layer hierarchy to capture long-time dependencies and variations. This section describes an empirical study where we compare MCAE with its single-layer correspondence.
Unsupervised Motion Representation Learning with Capsule Autoencoders
We propose the Motion Capsule Autoencoder (MCAE), which addresses a key challenge in the unsupervised learning of motion representations: transformation invariance. In the lower level, a spatio-temporal motion signal is divided into short, local, and semantic-agnostic snippets. In the higher level, the snippets are aggregated to form full-length semantic-aware segments. For both levels, we represent motion with a set of learned transformation invariant templates and the corresponding geometric transformations by using capsule autoencoders of a novel design. This leads to a robust and efficient encoding of viewpoint changes.
Unsupervised Motion Representation Learning with Capsule Autoencoders
We propose the Motion Capsule Autoencoder (MCAE), which addresses a key challenge in the unsupervised learning of motion representations: transformation invariance. In the lower level, a spatio-temporal motion signal is divided into short, local, and semantic-agnostic snippets. In the higher level, the snippets are aggregated to form full-length semantic-aware segments. For both levels, we represent motion with a set of learned transformation invariant templates and the corresponding geometric transformations by using capsule autoencoders of a novel design. This leads to a robust and efficient encoding of viewpoint changes.